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ABSTRACT 

A wavelet-based image registration approach has previously been proposed by the authors. In this work, wavelet 
coefficient maxima obtained from an orthogonal wavelet decomposition using Daubechies filters were utilized to register 
images in a multi-resolution fashion. Tested on several remote sensing datasets, this method gave very encouraging results. 
Despite the lack of transladon-invariance of these filters, we showed that when using cross-correlaUon as a feature matching 
technique, features of size larger than twice the size of the filters are correctly registered by using the low-frequency subbands 
of the Daubechies wavelet decomposition. Nevertheless, high-frequency subbands are still sensitive to translation effects, hi 
this work we are considering a rotation- and translation-invariant representation developed by E. Simoncelli and integrate it m 
our image registration scheme. The two types of filters, Daubechies and Simoncelli filters, are then being compared from a 
registration point of view, utilizing synthetic data as well as data from the Landsat/ Thematic Mapper (TM) and from the 
NOAA Advanced Very High Resolution Radiometer (AVHRR). 


1. INTRODUCTION 

Automatic registration and resampling of remotely sensed data will be an essential element of future Earth satellite 
observation systems. New remote sensing systems will generate enormous amounts of data representing multiple 
observations of the same features at different times and/or by different sensors with, most often, these sensors being spread 
over multiple platforms. Automatic registration and resampling methods are indispensable for such tasks as data fusion, 
navigation achieving super-resolution, or optimizing communication rates between spacecraft and ground systems, f or all 
these tasks, accurate image registration is the first step, since a number of distortions prevent two images acquired either by 
the same sensor at different times or by two sensors at the same or different times from being perfectly registered to each 
other or to a fixed coordinate system. Distortions usually correspond to orbit and attitude anomalies, but some continuous 
nonlinear distortions are also due to altitude, velocity, yaw, pitch, and roll. It is very difficult to determine exact location 
within an image using only ancillary data and geo-location is usually computed by combining navigation and registration. 
Navigation corresponds to a “systematic correction” based on image acquisition models taking into account satellite orbit and 
attitude, sensor characteristics, platform/sensor relationship, Earth surface and terrain models and brings the registration 
accuracy within a few pixels. Image registration, on the other hand, corresponds to a “precision correction based on 
landmarks and image features, and refines the geolocation to a subpixel accuracy. Registration is either applied after the 
navigation process, or both processes are integrated in a closed feedback loop. In this paper we will only consider the issue of 
feature-based, precision-correction automatic image registration. 

In general, image registration can be defined as the process which determines the best match of two or more images 
acquired at the same or at different times by different or identical sensors. One set of data is taken as the reference data, and 
all other data, called input data (or sensed data), is matched relative to the reference data. The general process of image 
registration includes three main steps: (1) the extraction of features to be used in the matching process (2) the feanue 
matching strategy and metrics, and (3) the resampling of the data based on the correspondence computed from matched 
features This paper only deals with steps (1) and (2). Currently, the most common approach to satellite image registrauon is 
to perform step (1) manually by interactive extraction of a few outstanding characteristics of the data, which are called control 
points (CP's) tie-points, or reference points. The CP's in both images (or image and map) are matched by pair and used to 
compute the parameters of a geometric transformation. But such a point selection represents a repetitive, labor- and time- 
intensive task which becomes prohibitive for targe amounts of data, and often leads to large registration errors [1]. A large 
number of automatic image registration methods have been proposed and surveys can be found in [2-4] Some of the features 
which are being utilized for step (1) are: original gray levels, edges, regions, and more recently wavelet features. According to 
[21, step (2) itself can be separated into: 



• the search space, i.e. the class of potential transformations that establish the correspondence between input data and 
reference data. Transformations that are often used arc rigid transformations (composed of a scaling, a translation and a 
rotation), affine transformations (composed of a rigid transformation, a shear and an aspect-ratio change, a shear in the x- 
axis transforms the x-coordinate into a linear combination of both x and y-coordinates, and the aspect-ratio is denned as the 
numerical ratio of image width to height), and polynomial transformations. 

• a search strategy, which is used to choose which transformations have to be computed and evaluated. Local or global 
search multi-resolution search or optimization techniques are examples of various search strategies. 

• a similarity metric, which evaluates the match between input data and transformed reference data for a given transformation 
chosen in the search space. Correlation measurement has been the most often used but other methods such as a Hausdorff 
distance [5] can also be utilized. 

A wavelet- based image registration approach has previously been proposed by the authors [4,6,7], In this work wavelet 
coefficient maxima, obtained from an orthogonal wavelet decomposition using Daubechies filters [8], were utilized to register 
images in a multi-resolution fashion. Tested on several remote sensing datasets, this method gave very encouraging results. In 
the study reported in [9], we showed that when using cross-correlation as a feature matching technique, features of size larger 
than twice the size of the filters are correctly registered using the low-frequency subbands of the Daubechies wavelet 
decomposition. Nevertheless, features extracted from the high-frequency subbands are still sensitive to translation effects. 

In this work we are utilizing filters developed by E. Simoncelli [10-12] and we integrate them in our wavelet-based 
image registration scheme. The two types of filters, Daubechies and Simoncelli filters, are then being compared from a 
registration point of view, utilizing synthetic data as well as data from the Landsat/ Thematic Mapper (TM) and from the 
NOAA Advanced Very High Resolution Radiometer (AVHRR). Results are presented in section 3. 


2. SOME BACKGROUND ON WAVELET-BASED 
IMAGE REGISTRATION OF SATELLITE IMAGERY 

2 « Previous Wavelet-Based Image Registration Method ...... 

Wavelet transforms provide a space-frequency representation of an image. In a wavelet representation, the original signa 
is filtered by the translations and the dilations of a basic function, called the “mother wavelet”. In our wavelet-based 
registration, only discrete orthonormal bases of wavelets have been considered and are implemented by filtenng, separately in 
rows and in columns, the original image by a high-pass and a low-pass filter, thus in a multi-resolution fashion [13]. At eac 
level of decomposition, four new images are computed; each of these images is a quarter the size of the prevtous ongina 
image and represents the low frequency or high frequency information of the image in the horizontal and/or the vertical 
directions; images LL (Low/Low), LH (Low/High), HL(High/Low), and HH (High/High). Starting again from the 
"compressed" image (or image representing the low-frequency information, “LL”), the process can be iterated, thus building a 
hierarchy of lower and lower resolution images. Figure 1 summarizes the multi-resolution decomposition. 


Original or 
Previous 
Low-Pass 
Results, Ik 


Rows 



Columns 



Decimat i 
Rows -| 
by 2 





Decimal 

L 

H 

— 

Rows 




by 2 



.Hk+i 




uecimai ; 
Rows - 
by 2 


L HLk+i 




uecimat 



Rows 


by 2 

, j 


HHk+J 


Figure 1 

Decomposition by an Orthonormal Basis of Wavelets 













Our wavelet-based method represents a three-step approach to automatic registration of remote sensing imagery. The first 
step involves the wavelet decomposition of the reference and input images to be registered. In the second step, we extract at 
each level of decomposition domain-independent features from both reference and input images. Finally, we utilize these 
features to compute the transformation function by following the multiresolution approach provided by the wavelet 
decomposition. Following the registration framework described in the previous section, our algorithm will utilize the four 
following components: 

• The feature space 

Features are either chosen as the gray levels provided by the low-frequency LL compressed versions of the original image 
(for non-noisy images), or are based on the high-frequency information (e.g., maxima points of LH and HL images) extracted 
from the wavelet decomposition. In this second option, only those points whose intensities belong to the top x% of the 
histograms of these images are kept (x being a parameter of the program whose selection can be automatic); we call these 
points “maxima of the wavelet coefficients.” These maxima are computed for all levels of the wavelet decomposition, for 
reference as well as input images. 

• The search space 

In a first step, we assume the transformation to be either a rigid or an affine transformation. Both types of 
transformations include compositions of translations and rotations; therefore, as a preliminary study, our search space is 
composed of 2-D rotations and translations, and will be extended later to rigid and affine transformations. We look for 
rotations with angles included in the interval [0,90degrees] and for translations in the interval [0, half pixel-size of reference 
image]. 

• The search strategy 

Our search strategy follows the multi-resolution wavelet decomposition, iteratively from the deepest level of 
decomposition (where the image size is the smallest), until the first top level of decomposition, i.e. going from low 
resolution up to high resolution. For all levels of decomposition, the subband images of the reference image are successively 
transformed by all the transformations included in the search space. Then maxima of the transformed reference wavelet images 
and of the input wavelet images are extracted. The accuracy of this search increases when going from low resolution to high 
resolution. At each level the search focuses in an interval around the "best" transformation found at the previous level with an 
accuracy D and is refined at the next level up with an accuracy D/2. More details on this algorithm can be found in [4,6,7]. 

• The similarity metric 

At each level of decomposition and for each of the transformations, a correlation measure is computed between 
transformed reference maxima and input maxima. Another measure, based on a generalized Hausdorff distance, has also been 
studied and very encouraging results are reported in [5]. 

The previous algorithm is summarized in Figure 2. Tested on several datasets, the wavelet-correlation-based method 
described in this section performs with an accuracy of 1.66 pixels [14]. When using a statistically robust matching method 
based on a generalized Hausdorff distance, the first tests show that a subpixel accuracy can be obtained. More details on the 
results can be found in [4,14,15]. 



Figure 2 

Summary of Our Wavelet-Correlation Image Registration Method 


2.b Rotation and Translation Invariance Issues 

According to the Nyquist criterion, in order to distinguish between all frequency components and to avoid aliasing, the 
signal must be sampled at least twice the frequency of the highest frequency component. Therefore, as pointed out in [10], 




“translation invariance cannot be expected in a system based on convolution and subsampling.” When using a separable 
orthogonal wavelet transform (described in Figure 1), information about the signal changes within or across subbands. By 
lack of translation invariance, we mean that the wavelet transform does not commute with the translation operator, and 
similar remarks can be made relative to the rotation operator. Following these remarks, we conducted a study where the use of 
wavelet subbands is quantitatively assessed as a function of features’ sizes. The study reported in [9] shows that when using 
cross-correlation, the method described in 2. a is still a useful registration scheme in spite of translation effects. The results are 
summarized here, see [9] for more details: 

• the low-pass subband is relatively insensitive to translation, provided that the features of interest have an extent at least 
twice the size of the wavelet filters. 

• the high-pass subband is more sensitive to translation, but the peak correlations are still high enough to be useful. 

Following this study, the work presented in this paper only considers the high-pass subbands and look at rotation- and 
translation-invariant filters [10] in order to create the feature space. Although the scheme described in Figure 2 would be more 
optimal if a different similarity metrics were used, we keep the same correlation framework for the only purpose of comparing 
Daubechies and Simoncelli’s filters under the same conditions and for registration purposes. Experiments involving a different 
search strategy and different similarity metrics are currently being performed as a continuation of this work. 


3. USE OF A ROTATION- AND TRANSLATION-INVARIANT REPRESENTATION 

FOR IMAGE REGISTRATION 

3.a Rotation- and Translation Invariant Representation 

The method described by E. Simoncelli in [10-12] enables to build translation- and rotation-invariant filters by relaxing 
the critical sampling condition of the wavelet transforms. By invariance, it is meant that the information contained in a given 
subband will be invariant to translation or rotation. The resulting representation is equivalent to an overcomplete wavelet 
transform, it is not an orthogonal representation but is an approximation of a “tight-frame” [8], i.e. invertible. The Steerable 
Pyramid described in [10] is summarized in Figure 3, where only the analysis decomposition is shown. HO is a high-pass 
filter, L0 and LI are two low-pass filters, and [BO, ..., Bk] represents a set of oriented band-pass filters which ensure the 
representation to be rotation-invariant. In order to ensure translation-invariance, the output of the high-pass filter and of the 
band-pass filters are not subsampled. In addition, the portion of the signal which is iteratively decomposed by the band-pass 
and the low-pass filters does not contain the larger high frequency components and has been preprocessed by the low-pass 
filter, L0, thus removing most aliased components. 
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Figure 3 

Decomposition by a Steerable Pyramid 

As stated in [10], this representation is overcomplete by a factor of 4k/3, where k is the number of oriented band-pass 
filters. In the experiments shown below, in order to optimize the computational speed, we chose k=l. The decomposition was 
iterated 3 times and the subbands which were considered for feature selection in the registration algorithm are {S0,S1,S2,S3} 
as shown in Figure 4. 








Figure 4 

3-Level Decomposition by a Steerable Pyramid Using Only l Oriented Band-Pass Filter 


3.b Results of the Comparative Study 

3.b.l Description of the Parameters 

As we previously stated, the purpose of this study is to vary the type of features used in the registration process descnbed 

in section 2a and in Figure 2 and observe the results when tested on multiple datasets. 

Using the Daubechies filters and the separable orthogonal decomposition of Figure 1 three levels of decomposition are 
processed and the feature space is composed of the maxima of images {LH2.HL2}, {LH1.HL1}, and {LHO.HLO} for each 
respective refinement iteration. These images correspond to a decimation by 8, 4, and 2 of the onginal image, respective y. 

Using the Simoncelli filters and the Steerable Pyramid decomposition of Figure 4, three levels of decomposition are 
processed and the feature space is composed of the maxima of images {S3},{S2}, and {SI}. These images correspond to a 
decimation of 8, 4, and 2 of the original image, respectively. Although using the features provided by SO would signifi^Uy 
improve the results (since it is of size identical to the original’s), this information has been purposely left out in order to 
provide a consistent comparison of the results between the two types of filters. 

Since at each level, Daubechies’ s features are obtained from two different subbands and Simoncelli s features from only 
one subband, the maxima extraction threshold has been tested at { 15%, 20%, 25%} for Daubechies s features and at 
{30%,40%,50%}, respectively, for Simoncelli’s features, thus keeping the same number of features to be correlated in both 

cases. 

3.b.2 Description of the Test Datasets 

The tests were performed on eight different datasets which are also illustrated in Figures 5 to . f 

Dataset # 1 , "SYNTH". A 512x512 synthetic image formed of various geometric shapes was created L Used as a 
image transformed images are generated by applying compositions of rotations R(0) and T=(Tx,Ty) where 
0=(0, 5,10,15,20,25} degrees and Tx,Ty={5, 10, 15,20} pixels. This results in a dataset of 102 images incuding the reference 

'Damsels #2-#5, "NSYNTH.G2 ”, "NSYNTH.G5 ”, "NSYNTH.G10”, "NSYNTH.G20”. The previous dataset was altered by 

white Gaussian noise of variance 2,5,10, and 20, respectively. . i; , 

Dataset #6, "GIRL". Die reference image for this dataset is a 512x512 dig.tn red photO£aph 

noise. The transformations of the reference image include the set of translations { ( 6,4),(M0),(12,8),( ’ ^ ' I’L , ’ -J ’ 

rotations of angles (5,10,15,20,25) degrees, and the same rotations combined with the translations {(2, 4), (6, 4), (20, 60)}. This 

The reference TM image is a 512x512 portion of Band 2 of a Landsat-Themadc Mapper (TM) scene over 
the Pacific Northwest. Transformations are identical to the ones described for dataset #6, with 27 riles. 

Dataset #8, "AVHRR". This dataset is much smaller but represents a “real-life dataset”. It is a series of th'rteen 512 rows y 
1024 columns AVHRR/LAC images over South Africa. Raw AVHRR data are navigated and georeferenced to a geograp c 





grid that extends from -30.20 S. 15.39 E (upper left) -34.79 S, 24.59 E (lower right). The navigation process uses an orbital 
model developed at the University of Colorado 1 16| and assumes a mean attitude behavior (roll, pitch and yaw) derived using 
Ground Control Points [17], A map of the coastline derived from the Digital Chart ot the World (DCW) is generated for the 
same geographic grid and is also available for this dataset. In this case, the actual transformation is not known, but results of 
a manual registration are used to assess the accuracy ot the automatic registration. 


Registration results obtained with the two different types of filters are summarized in Tables 1 and 2 and graphically 
represented in Figure 10. These results show that the two types of filters give similar results for ideal or low-noise images 
but as soon as the noise level increases, the registration accuracy is much more stable with Simoncelli’s filters. Overall the 
translation accuracy obtained with these filters stays around 1 pixel, and even reaches subpixel accuracy for the “GIRL” and 
“TM” datasets; while the accuracy using Daubechies’ filters greatly varies depending on the contents of the images. The 
results are consistent for all levels of thresholds chosen in the maxima selection, even when the noise level increases. 

4. CONCLUSION AND FUTURE WORK 

This paper presented an image registration method based on wavelets and overcomplete wavelets. It was shown that, as 
expected and due to their translation- and rotation- invariance, Simoncelli’s filters perform better than Daubechies’ filters. 
Quantitative mesaurements support this conclusion. 

As we recognize that the exhaustive search involving multiple cross-correlations is not optimal, we are looking at 
other search strategies and similarity metrics, such as optimization and robust matching. Future work will also exploit the 
full rotation-invariance capability of the steerable filters by varying the number of band-pass filters. 



Figure 5 

Dataset #1 (“SYNTH”) 
Reference and Some Transformations 


Figure 6 

Dataset #5 (“NSYNTH.G20”) 
Reference and Some Transformations 


















Figure 7 - Dataset #6 (“GIRL”) 
Reference and Some Transformations 


Figure 8 - Dataset #7 (“ TM ”) 
Reference and Some Transformations 



Figure 9 - Dataset #S (“ A VHRR") 

Series of Thirteen Multi-Temporal AVtlRR Images 
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Registration Results for Both Types of Filters on Synthetic Datasets #l-#5 
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Registration Results for Both Types of Filters on Datasets #6 -ft 8 and Overall Results 
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Figure 10 

Summary of Results for Translation Accuracy 





